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Information (including personal information) collected by agencies for the administration 
of programs, policies or services and with the potential to be used for statistical purposes. 
Administrative data is one type of microdata. 


Aggregate data are produced by grouping information into categories and aggregating 
values within these categories. For example, a count of the number of people of a particular 
age (obtained from the question ‘In what year were you born?’). 


Aggregate data is typically presented in tables. Aggregate data is also referred to as tabular 
data or macrodata. 


This term is most commonly used to refer to data from which direct identifiers have been 
removed (de-identified data), but it is sometimes also used to refer to confidentialised data. 
To avoid confusion the more specific terms de-identified data and confidentialised data 
are used in the Confidentiality Information Series. 


See Cell dominance rule. 


A rule commonly applied to cells in a table to assess whether a cell may enable identification. 
The cell dominance rule (also called the cell concentration rule) is used to identify cells where 
a small number of data providers contribute a large percentage to the cell. If a cell fails this rule 
further investigation or action is needed to ensure that identification is unlikely. 


For more information see Confidentiality Information Sheet 4: ‘How to confidentialise data: the basic 
principles: 


Data that will allow identification of an individual or organisation, either directly or indirectly. 


To remove or alter information, or collapse detail within a dataset, to ensure that no person or 
organisation is likely to be identified in the data (directly or indirectly). 


For more information see Confidentiality Information Sheet 1: ‘Confidentiality: what is it and 
why is it important?’ and Confidentiality Information Sheet 4: ‘How to confidentialise data: the basic 
principles.’ 


Confidentialised Unit Record Files (CURFs) are files containing microdata that have been 
de-identified and modified to protect individuals or organisations from either direct or indirect 
identification. 


‘An obligation to the provider of information to maintain the secrecy of that information’ 


Source: UN Economic Commission for Europe, 2009. 


For more information see Confidentiality Information Sheet 1: ‘Confidentiality: what is itand 
why is it important?’ 


For more information about confidentiality, or to provide feedback on this series, 
please email: inquiries@nss.gov.au 
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Rules that are applied to each cell in a table to identify table cells that pose a risk of identification 
(disclosure). Two common rules are the frequency rule and the cell dominance rule. 


The organisation or agency which is responsible for the collection, use and disclosure of 
information in a dataset. Data custodians have an obligation to keep the confidential information 
they are entrusted with secret. 


On-site data laboratories provide access to detailed microdata at a secure site controlled by the 
data custodian. 


See Perturbation. 


An individual, household, business or other organisation which supplies data either for statistical 
or administrative purposes. 


A technique used to confidentialise data. Data reduction methods aim to control or limit the 
amount of detail available to avoid identification of a particular individual or business. Data 
reduction methods include combining categories of information or suppressing information 
for unsafe cells. 


For more information about techniques to confidentialise data see: Confidentiality Information 
Sheet 4: ‘How to confidentialise data: the basic principles’ 


Involves slightly altering small cells in a table to ensure results from analysis based on the 
data are not significantly affected, but the original values cannot be known with certainty. 
Data rounding may be random or controlled. 


For more information about data rounding techniques see: Confidentiality Information Sheet 4: 
‘How to confidentialise data: the basic principles’ 


Data that have had any identifiers removed. May also be referred to as unidentified data. 


Occurs when a direct identifier is included with the data that can be used to establish the 
identity of an individual or organisation. 


Disclosure occurs when a person or an organisation recognises or learns something that they 
did not already know about another person or organisation through released data. 


Managing the risks of an individual or organisation being identified either directly or indirectly. 


A rule commonly applied to cells in a table to assess whether a cell may enable identification. 
The frequency rule (also called the threshold rule) sets a threshold value for the minimum 
number of individuals or businesses in any cell. Common threshold values are 3,5 and 10. Ifa 
cell fails this rule further investigation or action is needed to ensure that identification is unlikely. 


Able to be identified either directly or indirectly. 


See Direct identification, Identified data and Indirect identification. 


For more information about confidentiality, or to provide feedback on this series, 
please email: inquiries@nss.gov.au 
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Data that include an identifier. 


An identifier (direct identifier) is information that directly establishes the identity of an individual 
or organisation. Examples of identifiers are: name, address, driver's licence number, Medicare 
number and Australian Business Number. 


Occurs when the identity of an individual or organisation is disclosed, not through the use of 
direct identifiers, but through a combination of unique characteristics. 


See Aggregate data. 


Unit record data where each record represents observations for an individual or organisation. 
Unit record data may contain individual responses to questions on a survey questionnaire or 
administrative forms. For example, answers given to the question ‘In what year were you born?! 


An unusual value that is correctly reported but is not typical of the rest of the population. 


‘Information or an opinion (including information or an opinion forming part of a database), 
whether true or not, and whether recorded in a material form or not, about an individual whose 
identity is apparent, or can reasonably be ascertained, from the information or opinion! 

(Privacy Act 1988) 


Personal information is information that identifies, or could identify a person. There are some 
obvious examples of personal information, such as name or address. Personal information can 
also include medical records, bank account details, photos, videos, and even information about 
what a person likes, their opinions and where they work — basically, any information through 
which a person is reasonably identifiable. 


Information does not have to include a name to be personal information. For example, in some 
cases, date of birth and post code may be enough to identify someone. 


A technique used to confidentialise data. Perturbation is a data modification method that 
involves changing the data slightly to reduce the risk of disclosure while retaining as much 
content and structure as possible. 


Perturbation techniques include data rounding or data swapping. 


For further information about techniques to confidentialise data see: Confidentiality Information 
Sheet 4: ‘How to confidentialise data: the basic principles: 


The individual's right to have their personal information managed so that it is kept confidential 
except where informed consent has been given to release the information, or a legal authority 
exists, in accordance with the requirements of the Privacy Act 1988. 


An individual, household, business or other organisation which provides data either to statistical 
collections or administrative collections. May also be referred to as a respondent. 


Occurs when a user tries to find a particular person or organisation with a set of characteristics 
known to the user. 


Remote access facilities are used by agencies around the world to enable approved researchers 
to submit data queries through a secure internet-based interface from their desktop. The 

request is run against a confidentialised unit record file, which is securely stored within the data 
custodian's computing environment. 


For more information about confidentiality, or to provide feedback on this series, 
please email: inquiries@nss.gov.au 
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The presence of a rare characteristic in the data can pose an identification risk, depending on 
how remarkable (or extraordinary or noticeable) the characteristic is. This might include people 
in unusual jobs, very large families or young people with very high educational qualifications. 


In the context of confidentiality, risk management involves identification and management of 
the risk of disclosure in accordance with the impact and likelihood of a disclosure occurring and 
within the constraints provided by legislation and policies. 


Further information about the process of managing microdata disclosure risks is in Confidentiality 
Information Sheet 5 — ‘Confidentiality: managing the risk of disclosure in the release of microdata’ 


Safe storage and access to held data, including physical security of buildings and IT security. 
An identification made without any deliberate attempt. 


Purposes which support the collection, storage, compilation, analysis and transformation of 
data for the production of statistical output, and the dissemination of those outputs and the 
information describing them. 


This means that information cannot be used for administrative, regulatory, law enforcement 
or other purpose that affects the rights, privileges or benefits of particular individuals or 
organisations. 


Data suppression involves not releasing information that is considered unsafe because it fails 
confidentiality rules being applied. 


For further information about techniques to suppress data see: Confidentiality Information Sheet 4: 
‘How to confidentialise data: the basic principles’ 


See Aggregate data. 


See Frequency rule. 


See De-identified data. 


Used to characterise the situation where an individual can be distinguished from all other 
members in a population or sample in terms of information available on microdata records. 

The existence of uniqueness is determined by the size of the population or sample, the degree 
to which it is segmented by geographic information and the number and detail of characteristics 
provided for each unit in the dataset. 


See Microdata. 


For more information about confidentiality, or to provide feedback on this series, 
please email: inquiries@nss.gov.au 


